Slicing data is a technique which is used to create small sets of your large data.
The pandas loc function allows us to search and slice data based on both index and columns. It is a powerful tool to allow us to focus on the important rows and columns for our data analytics.
dataframe.loc[starting_row:end_row,starting_column:end_column]
The following code will display rows 2 to 5 and columns "Higher Education Institution" to "Enrolled_Post Graduate"
data.loc[2:5,"Higher Education Institution": "Enrolled_Post Graduate"]
You can display columns that are not in sequence, you need to add then inside a square bracket [ ].
data.loc[[3,5,7],["Higher Education Institution", "Enrolled _Under Graduate"]
The pandas iloc function similar to loc to slice rows and columns, it use index for columns instead of column names.
The default index in a DataFrame is integer values starting from zero. To change the default index to any other column, you need to use .set_index as follows:
data.set_index("igher Education Institution",inplace=True)
data.set_index("ColumnName",inplace=True)
Note: Higher Education Institution is now the index and presented different on the DataFrame. The column is appearing in bold.
When you need to reset the index back to its original values. There are different ways to do this. On common method is to run the line that reads the data from your source. However, you can use the function: .reset_index()
Reset the index of our test example to its original values
data.reset_index(inplace=True) data.head()
When you need to summaries the data in data frame Pandas makes the calculation of different statistics very simple.
dataframe["Column"].statistics_method()
dataframe.statistics_method()
Finding unique (nonrepeating) values in a column is needed to perform analysis on your data.
dataFrame["Column"].unique
For example, to know the unique values in the column "Specialisation" use the function .unique() that helps you with perform this task.
data["Specialisation"].unique
Pandas allows you to easily add new columns to the DataFrame. This is usually used to create a new calculated column.
Syntax to create a new column:
DataFrame["New Column"] = expression
data[" Total Enrolled"] = data[" Enrolled _ Undergraduate"] + data[" Enrolled_Post Greauate"] data.head()
newDataFrame = dataFrame1.append(dataFrame2)
writer = pd.ExcelWriter(' NewData.xlsx ') data.to_excel(Writer,'sheet1 ') Writer.save()
The above lines stores the DataFrame data in the an excel file 'NewData.xlsx' in a sheet with the name 'Sheet1'.